Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Phase plan (P0/P1/P2) — next 2 weeks
Phase P0 (Days 1–5): v1 schema + state fields end-to-end (blocking)
Objective: add state fields to the canonical snapshot schema and make them queryable/assertable in both SDKs.
Deliverables
sentience-chrome (raw extraction)
input_value(orvalue) for inputs/textarea/select (with redaction rules)input_type(to support password redaction)checked/aria_checkeddisabled/aria_disabledaria_expandedname(best-effort):aria-label,aria-labelledby, associated<label for=...>, placeholder fallbackinput_type=password: omit value or setvalue_redacted=truevalueto a max length (e.g., 200 chars) to reduce PII risk + payload bloatgateway (canonical response schema)
gateway/src/snapshot/types.rs:Attributesand/orRawElementSmartElementoutput schema to include:name,value(redacted/clipped),input_typearia_checked,aria_disabled,aria_expandedchecked,disabled,expandedgateway/src/snapshot/processing.rsmapping:SmartElementsdk-python
sentience/models.py::Elementwith optional fields matching gateway output.checked=true|false|mixeddisabled=true|falseexpanded=true|falsevalue="...",value~"...",name~"..."(if exposed)sentience/verification.py(implemented as predicates overquery(...)):is_enabled(selector)/is_disabled(selector)is_checked(selector)/is_unchecked(selector)value_contains(selector, substr)/value_equals(selector, value)is_expanded(selector)/is_collapsed(selector)sdk-ts
src/types.ts::Elementwith optional fields matching gateway output.src/verification.tsmirroring python.sentience-core (checkpoint)
sentience-corechanges?Tests (P0)
Phase P1 (Days 6–10): v1 runtime ergonomics + failure intelligence
Objective: make assertions production-grade without requiring Studio.
Deliverables
Recommended API shape:
AssertionHandle.eventually(...)Adding
assertEventually()/assertDoneEventually()creates a second “family” of runtime methods. A better UX (closer to Jest/Playwright/Cypress) is:assert_()/assert()behavior unchanged (returnsbool, emits trace events).AssertionHandle:runtime.check(predicate, label=..., required=False)→AssertionHandleruntime.check(predicate, label, { required })→AssertionHandleAssertionHandlesupports:.once()(single evaluation; delegates to existingassert_()/assert()).eventually(...)(retry loop with fresh snapshots + backoff)runtime.checkDone(...).eventually(...)), but keep the core retry mechanism sharedNote: In Python,
assertis a keyword; keepassert_naming in the DSL/predicates and runtime method names.sdk-python (AgentRuntime)
AssertionHandle+runtime.check(...)returning it.await handle.eventually(timeout_s=10, poll_s=0.25, min_confidence=0.7, max_retries=...).assert_eventually(...)can remain as a thin wrapper that internally callsruntime.check(...).eventually(...).details:no_snapshot,no_match,match_offscreen,match_occluded,state_mismatchsdk-ts (AgentRuntime)
AssertionHandle+runtime.check(...).await handle.eventually({ timeoutMs: 10_000, pollMs: 250, minConfidence: 0.7, maxRetries }).assertEventually(...)can be a thin wrapper overcheck(...).eventually(...)if desired for discoverability.CLI-first artifacts (both SDKs)
Tests (P1)
Phase P2 (Days 11–14): v2 snapshot confidence/exhaustion + minimal vision fallback
Objective: stop agents failing silently on unstable pages; provide deterministic escalation.
P2.1 Snapshot confidence + exhaustion
sentience-chrome
document_ready_statenode_countquiet_ms(MutationObserver-based)layout_delta(if feasible without major overhead)gateway
diagnostics(instead ofmeta):confidence(0..1)reasons[]metrics(raw metrics above, for debugging)attempt,exhaustedfor retry loopsready_state,quiet_ms,node_count, and coarse “signal” like interactive element countsnapshot_exhaustedsdk-python + sdk-ts
diagnostics(and keep it optional for backward compatibility)..eventually()to:min_confidencesnapshot_exhausted) with reasons/metricsP2.2 Vision fallback (verifier-only, last resort)
sdk-python
LLMProvider.generate_with_image(...):supports_vision()is true.eventually()after snapshot exhaustion (with an explicit option/flag so callers can enable/disable vision fallback per assertion).sdk-ts
LLMProviderinterface (backward compatible):supportsVision(): boolean(default false in base class)generateWithImage(systemPrompt, userPrompt, imageBase64, options?)supportsVision=false.eventually()after exhaustion (with an explicit option/flag so callers can enable/disable vision fallback per assertion).Tests (P2)
snapshot.diagnosticsis optional and backward compatible.Next 2–4 weeks (v2 hardening) — phased priorities
Phase P0 (Week 3): EscalationPolicy + structured failure events (runtime-level)
Deliverables
FailureKind+RecoveryActionenums and emit them in trace + return them to callers.EscalationPolicyfocused on assertion execution (not full agent orchestration):Tests
Phase P1 (Week 3–4): State matcher completeness + normalization
Deliverables
aria-pressed,aria-selected,role=switch, select/option valueTests
Phase P2 (Week 4): Diff-based assertions (action-effect verification)
Deliverables
previous_snapshotinAssertContext(both SDKs) and inAgentRuntime.snapshot().diff.added,diff.removed,diff.modified,diff.count_added, etc.Tests
Phase P3 (Week 4): Vision fallback upgrade (optional)
Deliverables